Discriminative prediction of mammalian enhancers from DNA sequence.
نویسندگان
چکیده
Accurately predicting regulatory sequences and enhancers in entire genomes is an important but difficult problem, especially in large vertebrate genomes. With the advent of ChIP-seq technology, experimental detection of genome-wide EP300/CREBBP bound regions provides a powerful platform to develop predictive tools for regulatory sequences and to study their sequence properties. Here, we develop a support vector machine (SVM) framework which can accurately identify EP300-bound enhancers using only genomic sequence and an unbiased set of general sequence features. Moreover, we find that the predictive sequence features identified by the SVM classifier reveal biologically relevant sequence elements enriched in the enhancers, but we also identify other features that are significantly depleted in enhancers. The predictive sequence features are evolutionarily conserved and spatially clustered, providing further support of their functional significance. Although our SVM is trained on experimental data, we also predict novel enhancers and show that these putative enhancers are significantly enriched in both ChIP-seq signal and DNase I hypersensitivity signal in the mouse brain and are located near relevant genes. Finally, we present results of comparisons between other EP300/CREBBP data sets using our SVM and uncover sequence elements enriched and/or depleted in the different classes of enhancers. Many of these sequence features play a role in specifying tissue-specific or developmental-stage-specific enhancer activity, but our results indicate that some features operate in a general or tissue-independent manner. In addition to providing a high confidence list of enhancer targets for subsequent experimental investigation, these results contribute to our understanding of the general sequence structure of vertebrate enhancers.
منابع مشابه
Comparison of the Lipophosphoglycan 3 Gene of the Lizard and Mammalian Leishmania: A Homology Modeling
Background: Lipophosphoglycan 3 (LPG3) is required for the LPG assembly, a well known virulent molecule. In this study, the LPG3 gene of the lizard and mammalian Leishmania species were cloned and sequenced. A three-dimensional structure (3D) for the target sequence was also predicted by comparative (homology) modeling. Materials and Methods: An optimization PCR amplification was performed o...
متن کاملkmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets
Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data ...
متن کاملIn silico cloning and bioinformatics study of Brucella melitensis Omp31 antigen in different mammalian expression vectors
Brucella melitensis, as a pathogenic gram-negative intracellular bacterium, causes brucellosis in animals and humans. According to literature, the B. melitensis outer membrane protein 31 (Omp31) is considered as an important vaccine candidate against brucellosis. The aim of the current study was to compare three different expression constructs containing B. melitensis Omp31 antigen using bioinf...
متن کاملIntegrating Diverse Datasets Improves Developmental Enhancer Prediction
Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. Enha...
متن کاملGenomic organization and evolution of immunoglobulin kappa gene enhancers and kappa deleting element in mammals.
We have studied the genomic structure and evolutionary pattern of immunoglobulin kappa deleting element (KDE) and three kappa enhancers (KE5', KE3'P, and KE3'D) in eleven mammalian genomic sequences. Our results show that the relative positions and the genomic organization of the KDE and the kappa enhancers are conserved in all mammals studied and have not been affected by the local rearrangeme...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genome research
دوره 21 12 شماره
صفحات -
تاریخ انتشار 2011